Delving into Inter-Image Invariance for Unsupervised Visual Representations
نویسندگان
چکیده
Contrastive learning has recently shown immense potential in unsupervised visual representation learning. Existing studies this track mainly focus on intra-image invariance The typically uses rich transformations to construct positive pairs and then maximizes agreement using a contrastive loss. merits of inter-image invariance, conversely, remain much less explored. One major obstacle exploit is that it unclear how reliably pairs, further derive effective supervision from them since no pair annotations are available. In work, we present comprehensive empirical study better understand the role three main constituting components: pseudo-label maintenance, sampling strategy, decision boundary design. To facilitate study, introduce unified generic framework supports integration intra- Through carefully-designed comparisons analysis, multiple valuable observations revealed: 1) online labels converge faster perform than offline labels; 2) semi-hard negative samples more reliable unbiased hard samples; 3) stringent favorable for With all obtained recipes, our final model, namely InterCLR, shows consistent improvements over state-of-the-art methods standard benchmarks. We hope work will provide useful experience devising Code: https://github.com/open-mmlab/mmselfsup .
منابع مشابه
Delving Deeper into Convolutional Networks for Learning Video Representations
We propose an approach to learn spatio-temporal features in videos from intermediate visual representations we call “percepts” using Gated-Recurrent-Unit Recurrent Networks (GRUs). Our method relies on percepts that are extracted from all levels of a deep convolutional network trained on the large ImageNet dataset. While high-level percepts contain highly discriminative information, they tend t...
متن کاملImage representations for visual learning.
Computer vision researchers are developing new approaches to object recognition and detection that are based almost directly on images and avoid the use of intermediate three-dimensional models. Many of these techniques depend on a representation of images that induce a linear vector space structure and in principle requires dense feature correspondence. This image representation allows the use...
متن کاملUnsupervised Learning of Visual Representations using Videos
This is a review of unsupervised learning applied to videos with the aim of learning visual representations. We look at different realizations of the notion of temporal coherence across various models. We try to understand the challenges being faced, the strengths and weaknesses of different approaches and identify directions for future work. Unsupervised Learning of Visual Representations usin...
متن کاملUnsupervised learning of a steerable basis for invariant image representations
There are two aspects to unsupervised learning of invariant representations of images: First, we can reduce the dimensionality of the representation by finding an optimal trade-off between temporal stability and informativeness. We show that the answer to this optimization problem is generally not unique so that there is still considerable freedom in choosing a suitable basis. Which of the many...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Vision
سال: 2022
ISSN: ['0920-5691', '1573-1405']
DOI: https://doi.org/10.1007/s11263-022-01681-x